Graph Neural Networks (GNN)


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

1. Graph

  • Graphs G(V,E)
    • V: a set of vertices (nodes)
    • E: a set of edges (links, relations)
    • weight (edge property)
      • distance in a road network
      • strength of connection in a personal network
  • Graphs model any situation where you have objects and pairwise relations (symmetirc or asymmetirc) between the objects
Vertex Edge
People like each other undirected
People is the boss of directed
Tasks cannot be processed at the same time undirected
Computers have a direct network connection undirected
Airports planes flies between them directed
City can travel between them directed

1.1. Type of Graph

Undirected Graph vs. Directed Graph

  • Undirected graph
    • Edges of undirected graph points both ways between nodes
    • ex) Two-way road
  • Directed graph
    • A graph in which the edges are directed
    • ex) One-way raod

Weighted Graph

  • A graph with edges assigned costs or weights
  • Also called 'Network'
    • ex) connection between cities, length of road, circuit element capacity, communication network usage fee, etc.

Cycle vs. Acyclic Graph

  • Cycle
    • Simple path with start node and end node are same
      • Simple Path: No repeating vertices in the path
  • Acyclic Graph
    • A graph without cycle

1.2. Graph Structure

Graph and Adjacency Matrix

  • Simple undirected graph consist of only nodes and edges
  • Graph can be represented as adjacency matrix $A$
    • Adjacency matrix $A$ indicates adjacent nodes for each node
  • Need (number of nodes) $ \times$ (number of nodes) shape matirx to represent adjacency matirx of undirected graph
    • Symmetirc matirx



2. Graph Neural Network

  • Machine learning techniques such as linear regression can be applied well in Euclidean space, conversely it is not good for non-Euclidean domains
    • Machine learning models suppose that features are independent to each other, but in non-Euclidean domains such as tree or graph features are related to each other
  • Graph Neural Network (GNN) is a method that can be applied to non-Euclidean domains
  • GNN are classified into 4 classes
    • Recurrent Graph Networks (RecGNNs)
    • Convolutional Graph Neural Networks (ConvGNNs)
    • Graph AutoENcoders (GAEs)
    • Spatial-Temporal Graph Neural Networks (STGNNs)

3. Graph Convolution Network (GCN)

3.1. Convolution

  • In previous CNN lecture CNN has two characteristics, preserving the spatial structure and weight sharing
  • To apply convolution in graph network, graph also has to conpensate that characteristics too

Convolution Layer

  • In CNN, convolution layer preserve the spatial structure of input
  • It convolve over all spatial locations
    • Extract features for each convolution layer



Weight Sharing

  • Reduce the number of parameters by weight sharing
  • Within the same layer, the same filter will be used throughout image



3.2. Connection between CNN and GCN

  • GCNs perform similar operations where the model learns the features by inspecting neighboring nodes
  • The major difference between CNNs and GCNs is that CNNs are specially built to operate on regular (Euclidean) structured data, while GCNs operate for the graph data that the number of nodes connections vary and the nodes are unordered (irregular on non-Euclidean structured data)




3.3. Update Hidden States in GCN

  • Similar to CNN, GCN updates each node with their adjacent nodes
  • Unlike CNN, each node of GCN has different number of adjacent nodes
    • Indicate adjacent nodes of each node by adjacency matrix A
  • Basic process (or terminology) of GNN
    • Message: information passed by neighboring nodes to the central node
    • Aggregate: collect information from neighboring nodes
    • Update: embedding update by combining information from neighboring nodes and from itself

Update for Each Node





$$\begin{align*} H_1^{(l+1)} &= \sigma \left(H_1^{(l)}W^{(l)} + H_2^{(l)}W^{(l)} + H_3^{(l)}W^{(l)} + H_4^{(l)}W^{(l)} + b^{(l)} \right) \\ &= \sigma \left(\sum_{j \in N} H_j^{(l)}W^{(l)} + b^{(l)} \right) \\ \end{align*}$$



  • $H_1^{(l)}$: feature matirx of first node in $l^{th}$ layer
  • $W^{(l)}$: weight of $l^{th}$ layer
    • Weight sharing: share same weight for each layer
      • In the same layer, each node is updated similarly, so it shares the same weight
      • weight sharing enchance computing complexity and time
  • For each layer, the feature matrix and weight matirx are multiplied to create the next featuere matrix

Matrix Computation for Each Layer





  • Above image indicate matirx computation of feature matrix and weight matirx in each layer
    • $(i,j)$ cell of output matrix indicate the value of applying the $j^{th}$ filter to the $i^{th}$ node
    • Like convolution, extract high dimension information by applying filters for each layer
  • By multiplying adjacency matirx A to output matrix, update only iwth the node information connected to each node
    • Enables the model to learn the feature representations based on nodes connectivity


$$ H_1^{(l+1)} = \sigma \left(A \times H_j^{(l)}W^{(l)} + b^{(l)} \right) $$

3.4. Readout - Permutation Invariance

  • Adjacency matrix can be different even though two graph has the same network structure
    • Even if the edge information between all nodes is the same, the order of values in the matrix may be different due to rotation and symmetry
  • Readout layer makes this permutation invariant by multiplying MLP




Node-wise summation

$$ Z_G = \tau \left(\sum_{i \in G} MLP \left(H_i^{(L)} \right) \right) $$

3.5. Overall Structure of GCN





  • Graph information with feature matirx and adjacency matrix input to GCN
  • Graph Convolution Layer
    • Update information of each node according to their adjacency matirx
  • Collect all node information with MLP and determine a certain value for regression or classification in Readout layer

3.6. Three Types of GNN Problem

  • Task 1: Node classification

  • Task 2: Edges prediction

  • Task 3: Graph classification





4. Lab1: Building Simple Graph Convolutional Networks

  • Simple graph using 6 nodes
  • Learn the nodes features representation by building GCN
In [2]:
# !pip install networkx
Collecting networkx
  Downloading https://files.pythonhosted.org/packages/f3/b7/c7f488101c0bb5e4178f3cde416004280fd40262433496830de8a8c21613/networkx-2.5.1-py3-none-any.whl (1.6MB)
Requirement already satisfied: decorator<5,>=4.3 in c:\users\seungchul lee\appdata\local\programs\python\python36\lib\site-packages (from networkx) (4.4.0)
Installing collected packages: networkx
Successfully installed networkx-2.5.1
WARNING: You are using pip version 19.2.3, however version 21.2.4 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

4.1. Initializing the graph G

In [6]:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import fractional_matrix_power
%matplotlib inline

G = nx.Graph(name = 'G')

for i in range(6):
    G.add_node(i, name = i)
    
edges = [(0, 1), (0, 2), (1, 2), (0, 3), (3, 4), (3, 5), (4, 5)]
G.add_edges_from(edges)

print('Graph Info:\n', nx.info(G))

print('\nGraph Nodes: ', G.nodes.data())

nx.draw(G, with_labels = True, font_weight = 'bold')
plt.show()
Graph Info:
 Name: G
Type: Graph
Number of nodes: 6
Number of edges: 7
Average degree:   2.3333

Graph Nodes:  [(0, {'name': 0}), (1, {'name': 1}), (2, {'name': 2}), (3, {'name': 3}), (4, {'name': 4}), (5, {'name': 5})]

4.2. Insert adjacency matrix (A) to forward pass equation

In [2]:
A = np.array(nx.attr_matrix(G, node_attr='name')[0])
X = np.array(nx.attr_matrix(G, node_attr='name')[1])
X = np.expand_dims(X, axis=1)

print('shape of A: ', A.shape)
print('\nShape of X: ', X.shape)
print('\nAdjacency Matirx (A):\n', A)
print('\nNode Features Matirx (X):\n', X)
shape of A:  (6, 6)

Shape of X:  (6, 1)

Adjacency Matirx (A):
 [[0. 1. 1. 1. 0. 0.]
 [1. 0. 1. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 1. 1.]
 [0. 0. 0. 1. 0. 1.]
 [0. 0. 0. 1. 1. 0.]]

Node Features Matirx (X):
 [[0]
 [1]
 [2]
 [3]
 [4]
 [5]]
In [3]:
AX = np.dot(A, X)
print("Dot product of A and X (AX):\n", AX)
Dot product of A and X (AX):
 [[6.]
 [2.]
 [1.]
 [9.]
 [8.]
 [7.]]

4.3. Insert self-loops and normalizing A

Degree of undirected graph

  • the degree of vertex in a graph is the number of edges connected to it
  • denote the degree of vertex $i$ by $d_i$
  • for an undirected graph of $n$ vertices
$$d_i = \sum_{j=1}^{n} A_{ij}$$
  • Degree matrix $D$ of $A$
$$D = \text{diag}\{d_1, d_2, \cdots \}$$
  • example





$$A = \begin{bmatrix} 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 \\ 1 & 0 & 1 & 0 \end{bmatrix} \qquad \Rightarrow \qquad D = \begin{bmatrix} 3 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 2 \end{bmatrix} $$

Normalized $\tilde A$

  • Adding $I$ is to add self-connecting edges

  • Considering neighboring nodes in the normalized weights

$$\tilde A = D^{-1/2}(A+I)D^{-1/2}$$
  • to prevent numerical instabilities and vanishing/exploding gradients in order for the model to converge

Now we consider a feature matrix $X$

  • Weighted sum (or averaging) of neighboring features
$$\tilde A X = \left( D^{-1/2}(A+I)D^{-1/2}\right) X$$

Message passing framework

  • How a single node aggregates messages from its local neighborhood
$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( h_{u}^{(k)}, \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right)\\\\ H^{(k+1)} &= \sigma \left( A H^{(k)} \, W_{\text{self}}^{(k)} + A H^{(k)} \, W_{\text{neigh}}^{(k)} \right) \end{align*} $$

Message passing with self-loops

  • As a simplification of the neural message passing approach, it is common to add self-loops to the input graph and omit the explicit update step
$$ \begin{align*} h_{u}^{(k+1)} &= \text{UPDATE} \left( h_{u}^{(k)}, \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \right\} \right) \right) \\ &= \text{UPDATE} \left( \text{AGGREGATE} \left( \left\{ h_{v}^{(k)}, \forall v \in \mathcal{N}(u) \cup \{u \}\right\} \right) \right) \\ \\ H^{(k+1)} &= \sigma \left( \left(A+I \right)H^{(k)} \, W^{(k)}\right) \end{align*} $$

Neighborhood normalization

  • The most basic neighborhood aggregation operation simply takes the sum of the neighbor embeddings. One issue with this approach is that it can be unstable and highly sensitive to node degrees.

  • One solution to this problem is to simply normalize the aggregation operation based upon the degrees of the nodes involved. The simplest approach is to just take an average rather than sum

$$ \begin{align*} \tilde A &= D^{-1/2}AD^{-1/2} + I \\ & \approx \tilde D^{-1/2}(A+I) \tilde D^{-1/2} \qquad \text{where } \, \tilde D \, \text{ is the degree matrix of } A+I \end{align*} $$

Finally Graph Convolutional Networks

$$ \begin{align*} H^{(k+1)} &= \sigma \left( \left(\tilde D^{-1/2}(A+I)\tilde D^{-1/2} \right)H^{(k)} \, W^{(k)}\right)\\ &= \sigma \left( \tilde A H^{(k)} \, W^{(k)}\right) \end{align*} $$

normalizing term = $D^{-1}AX$

In [4]:
G_self_loops = G.copy()

self_loops = []
for i in range(G.number_of_nodes()):
    self_loops.append((i, i))
    
G_self_loops.add_edges_from(self_loops)

print('Edges of G with self-loops:\n', G_self_loops.edges)

A_hat = np.array(nx.attr_matrix(G_self_loops, node_attr='name')[0])
print('Adjacency Matrix of added self-loops G (A_hat):\n', A_hat)

AX = np.dot(A_hat, X)
print('AX:\n', AX)
Edges of G with self-loops:
 [(0, 1), (0, 2), (0, 3), (0, 0), (1, 2), (1, 1), (2, 2), (3, 4), (3, 5), (3, 3), (4, 5), (4, 4), (5, 5)]
Adjacency Matrix of added self-loops G (A_hat):
 [[1. 1. 1. 1. 0. 0.]
 [1. 1. 1. 0. 0. 0.]
 [1. 1. 1. 0. 0. 0.]
 [1. 0. 0. 1. 1. 1.]
 [0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 1. 1. 1.]]
AX:
 [[ 6.]
 [ 3.]
 [ 3.]
 [12.]
 [12.]
 [12.]]
In [5]:
Deg_Mat = G_self_loops.degree()
print('Degree Matrix of added self-loops G (D): ', Deg_Mat)

D = np.diag([deg for (n, deg) in list(Deg_Mat)])
print('Degree Matrix of added self-loops G as numpy  array (D):\n', D)

D_inv = np.linalg.inv(D)
print('Inverse of D:\n', D_inv)

DAX = np.dot(D_inv, AX)
print('DAX:\n', DAX)
Degree Matrix of added self-loops G (D):  [(0, 5), (1, 4), (2, 4), (3, 5), (4, 4), (5, 4)]
Degree Matrix of added self-loops G as numpy  array (D):
 [[5 0 0 0 0 0]
 [0 4 0 0 0 0]
 [0 0 4 0 0 0]
 [0 0 0 5 0 0]
 [0 0 0 0 4 0]
 [0 0 0 0 0 4]]
Inverse of D:
 [[0.2  0.   0.   0.   0.   0.  ]
 [0.   0.25 0.   0.   0.   0.  ]
 [0.   0.   0.25 0.   0.   0.  ]
 [0.   0.   0.   0.2  0.   0.  ]
 [0.   0.   0.   0.   0.25 0.  ]
 [0.   0.   0.   0.   0.   0.25]]
DAX:
 [[1.2 ]
 [0.75]
 [0.75]
 [2.4 ]
 [3.  ]
 [3.  ]]

normalizing term = $D^{-1/2}AD^{-1/2}X$

In [6]:
D_half_norm = fractional_matrix_power(D, -0.5)
DADX = D_half_norm.dot(A_hat).dot(D_half_norm).dot(X)
print('DADX:\n', DADX)
DADX:
 [[1.27082039]
 [0.75      ]
 [0.75      ]
 [2.61246118]
 [2.92082039]
 [2.92082039]]

4.4. Adding weights and activation function

Adding a non-linear function: 1st layer

$$ \begin{align*} H &= f \left(X, \tilde A \right) \\ & = \sigma \left(\tilde A X \, W \right) \end{align*} $$

Adding a non-linear function: $\mathcal l^{\text{th}}$ layer

$$ \begin{align*} H^{(\mathcal{l}+1)} &= f \left(H^{(\mathcal{l})}, \tilde A \right) \\ & = \sigma \left(\tilde A H^{(\mathcal{l})} \, W \right) \end{align*} $$
In [7]:
np.random.seed(7777)
n_h = 4
n_y = 2
W0 = np.random.randn(X.shape[1], n_h) * 0.01
W1 = np.random.randn(n_h, n_y) * 0.01

def relu(x):
    return np.maximum(0, x)

def gcn(A, H, W):
    I = np.identity(A.shape[0])
    A_hat = A + I
    D = np.diag(np.sum(A_hat, axis=0))
    D_half_norm = fractional_matrix_power(D, -0.5)
    eq = D_half_norm.dot(A_hat).dot(D_half_norm).dot(H).dot(W)
    return relu(eq)

H1 = gcn(A, X, W0)
H2 = gcn(A, H1, W1)
print('Features Representation from GCN output:\n', H2)
Features Representation from GCN output:
 [[0.0002451  0.0002094 ]
 [0.00015274 0.00013049]
 [0.00015274 0.00013049]
 [0.00046813 0.00039996]
 [0.00047767 0.00040811]
 [0.00047767 0.00040811]]

4.5. Plotting the features representation

In [8]:
def plot_features(H2):
    x = H2[:,0]
    y = H2[:,1]
    
    size = 1000
    
    plt.scatter(x, y, size)
    plt.xlim([np.min(x) * 0.9, np.max(x) * 1.1])
    plt.ylim([-1, 1])
    
    for i, row in enumerate(H2):
        str = "{}".format(i)
        plt.annotate(str, (row[0], row[1]), fontsize=18, fontweight='bold')
        
    plt.show()
    
plot_features(H2)

5. Lab2: Node Classification using Graph Convolutional Networks

In [6]:
#importing dependencies

import numpy as np
import os
import networkx as nx
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import shuffle
from sklearn.metrics import classification_report

from spektral.layers import GraphConv

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dropout, Dense
from tensorflow.keras import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import TensorBoard, EarlyStopping
import tensorflow as tf
from tensorflow.keras.regularizers import l2

from collections import Counter
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

5.1. Data loading

In [5]:
labels = np.load('./data_files/cora_labels.npy')
nodes = np.load('./data_files/cora_nodes.npy')
edge_list = np.load('./data_files/cora_edges.npy')
X = np.load('./data_files/cora_features.npy')

N = X.shape[0]
F = X.shape[1]

print('X shape: ', X.shape)
print('\nNumber of nodes (N): ', N)
print('\nNumber of features (F) of each node: ', F)
print('\nCategories: ', set(labels))

num_classes = len(set(labels))
print('\nNumber of classes: ', num_classes)
X shape:  (2708, 1433)

Number of nodes (N):  2708

Number of features (F) of each node:  1433

Categories:  {'Rule_Learning', 'Reinforcement_Learning', 'Case_Based', 'Genetic_Algorithms', 'Theory', 'Probabilistic_Methods', 'Neural_Networks'}

Number of classes:  7





5. 2 Train / Validation / Test data splitting

In case of GCN, it is possible to train with small number of data compared to supervised method because it is semi-supervised method. Therefore, in this task, train dataset will consist of 20 data for each class. Likewise, execute validation and test with 500 validation dataset and 1000 test dataset.

In [7]:
def encode_label(labels):
    label_encoder = LabelEncoder()
    labels = label_encoder.fit_transform(labels)
    labels = to_categorical(labels)
    return labels, label_encoder.classes_

labels_encoded, classes = encode_label(labels)
In [8]:
label_counter = np.zeros((num_classes, 1))
train_idx = []
for i in range(len(labels_encoded)):
    label = np.argmax(labels_encoded[i])
    if label_counter[label] < 20:
        train_idx.append(i)
        label_counter[label] += 1

rest_idx = [x for x in range(len(labels)) if x not in train_idx]

val_idx = rest_idx[:500]
test_idx = rest_idx[500:(500 + 1000)]
In [9]:
train_mask = np.zeros((N,), dtype=bool)
train_mask[train_idx] = True

val_mask = np.zeros((N,), dtype=bool)
val_mask[val_idx] = True

test_mask = np.zeros((N,), dtype=bool)
test_mask[test_idx] = True
In [10]:
print("All Data Distribution: \n{}".format(Counter(labels)))
print("\n")
print("Training Data Distribution: \n{}".format(Counter([labels[i] for i in train_idx])))
print("\n")
print("Validation Data Distribution: \n{}".format(Counter([labels[i] for i in val_idx])))
print("\n")
print("Test Data Distribution: \n{}".format(Counter([labels[i] for i in test_idx])))
All Data Distribution: 
Counter({'Neural_Networks': 818, 'Probabilistic_Methods': 426, 'Genetic_Algorithms': 418, 'Theory': 351, 'Case_Based': 298, 'Reinforcement_Learning': 217, 'Rule_Learning': 180})


Training Data Distribution: 
Counter({'Reinforcement_Learning': 20, 'Probabilistic_Methods': 20, 'Neural_Networks': 20, 'Case_Based': 20, 'Theory': 20, 'Genetic_Algorithms': 20, 'Rule_Learning': 20})


Validation Data Distribution: 
Counter({'Neural_Networks': 172, 'Genetic_Algorithms': 78, 'Probabilistic_Methods': 72, 'Theory': 63, 'Case_Based': 58, 'Reinforcement_Learning': 35, 'Rule_Learning': 22})


Test Data Distribution: 
Counter({'Neural_Networks': 290, 'Probabilistic_Methods': 172, 'Genetic_Algorithms': 156, 'Theory': 123, 'Case_Based': 114, 'Reinforcement_Learning': 85, 'Rule_Learning': 60})

5.3 Initializing the graph G

In [12]:
G = nx.Graph(name='Cora')
G.add_nodes_from(nodes)
G.add_edges_from(edge_list)

print('Graph info: ', nx.info(G))
Graph info:  Graph named 'Cora' with 2708 nodes and 5278 edges

5.4 Construct and Normalize adjacency matrix A

5.4.1 Insert self-loops to A

In [14]:
A = nx.adjacency_matrix(G)

I = np.eye(A.shape[-1], dtype=A.dtype)
A_hat = A + I

5.4.2 normalizing term = $D^{-1/2}AD^{-1/2}X$

In [17]:
degree = np.array(A_hat.sum(1))

D_half_norm = np.power(degree, -0.5).flatten()

D = np.diag(D_half_norm)

print('D:\n', D)

DAD = D.dot(A_hat).dot(D)
print('\nDAD:\n', DAD)

DAD = np.array(DAD, dtype=np.float32)
X = np.array(X, dtype=np.float32)
D:
 [[0.37796447 0.         0.         ... 0.         0.         0.        ]
 [0.         0.40824829 0.         ... 0.         0.         0.        ]
 [0.         0.         0.57735027 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.70710678 0.         0.        ]
 [0.         0.         0.         ... 0.         0.5        0.        ]
 [0.         0.         0.         ... 0.         0.         0.4472136 ]]

DAD:
 [[0.14285714 0.         0.         ... 0.         0.         0.        ]
 [0.         0.16666667 0.         ... 0.         0.         0.        ]
 [0.         0.         0.33333333 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.5        0.         0.        ]
 [0.         0.         0.         ... 0.         0.25       0.        ]
 [0.         0.         0.         ... 0.         0.         0.2       ]]

5.5 GCN Model

In [18]:
channels = 16
dropout = 0.5
l2_reg = 5e-4
learning_rate = 1e-2
epochs = 100
es_patience = 10

X_in = Input(shape=(F, ))
fltr_in = Input((N, ), sparse=True)

dropout_1 = Dropout(dropout)(X_in)
graph_conv_1 = GraphConv(channels,
                         activation='relu',
                         kernel_regularizer=l2(l2_reg),
                         use_bias=False)([dropout_1, fltr_in])

dropout_2 = Dropout(dropout)(graph_conv_1)
graph_conv_2 = GraphConv(num_classes,
                         activation='softmax',
                         use_bias=False)([dropout_2, fltr_in])

model = Model(inputs=[X_in, fltr_in], outputs=graph_conv_2)
optimizer = Adam(lr=learning_rate)
model.compile(optimizer=optimizer,
              loss='categorical_crossentropy',
              weighted_metrics=['acc'])
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 1433)]       0                                            
__________________________________________________________________________________________________
dropout (Dropout)               (None, 1433)         0           input_1[0][0]                    
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 2708)]       0                                            
__________________________________________________________________________________________________
graph_conv (GraphConv)          (None, 16)           22928       dropout[0][0]                    
                                                                 input_2[0][0]                    
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 16)           0           graph_conv[0][0]                 
__________________________________________________________________________________________________
graph_conv_1 (GraphConv)        (None, 7)            112         dropout_1[0][0]                  
                                                                 input_2[0][0]                    
==================================================================================================
Total params: 23,040
Trainable params: 23,040
Non-trainable params: 0
__________________________________________________________________________________________________

5.6 Train Model

In [19]:
validation_data = ([X, DAD], labels_encoded, val_mask)
model.fit([X, DAD],
          labels_encoded,
          sample_weight=train_mask,
          epochs=epochs,
          batch_size=N,
          validation_data=validation_data,
          shuffle=False,)
Epoch 1/100
1/1 [==============================] - 0s 298ms/step - loss: 0.1165 - acc: 0.1071 - val_loss: 0.3664 - val_acc: 0.2620
Epoch 2/100
1/1 [==============================] - 0s 180ms/step - loss: 0.1093 - acc: 0.3071 - val_loss: 0.3557 - val_acc: 0.4240
Epoch 3/100
1/1 [==============================] - 0s 180ms/step - loss: 0.1034 - acc: 0.4786 - val_loss: 0.3438 - val_acc: 0.5060
Epoch 4/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0970 - acc: 0.6000 - val_loss: 0.3324 - val_acc: 0.5180
Epoch 5/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0910 - acc: 0.5929 - val_loss: 0.3228 - val_acc: 0.5240
Epoch 6/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0851 - acc: 0.6714 - val_loss: 0.3142 - val_acc: 0.5360
Epoch 7/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0808 - acc: 0.6714 - val_loss: 0.3064 - val_acc: 0.5440
Epoch 8/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0791 - acc: 0.7000 - val_loss: 0.2986 - val_acc: 0.5500
Epoch 9/100
1/1 [==============================] - 0s 184ms/step - loss: 0.0765 - acc: 0.6643 - val_loss: 0.2900 - val_acc: 0.5520
Epoch 10/100
1/1 [==============================] - 0s 191ms/step - loss: 0.0731 - acc: 0.7000 - val_loss: 0.2808 - val_acc: 0.5780
Epoch 11/100
1/1 [==============================] - 0s 195ms/step - loss: 0.0707 - acc: 0.7357 - val_loss: 0.2723 - val_acc: 0.6280
Epoch 12/100
1/1 [==============================] - 0s 187ms/step - loss: 0.0683 - acc: 0.7571 - val_loss: 0.2642 - val_acc: 0.6840
Epoch 13/100
1/1 [==============================] - 0s 193ms/step - loss: 0.0689 - acc: 0.8143 - val_loss: 0.2568 - val_acc: 0.7160
Epoch 14/100
1/1 [==============================] - 0s 192ms/step - loss: 0.0646 - acc: 0.8429 - val_loss: 0.2498 - val_acc: 0.7340
Epoch 15/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0646 - acc: 0.9000 - val_loss: 0.2434 - val_acc: 0.7480
Epoch 16/100
1/1 [==============================] - 0s 186ms/step - loss: 0.0615 - acc: 0.9000 - val_loss: 0.2374 - val_acc: 0.7560
Epoch 17/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0596 - acc: 0.9000 - val_loss: 0.2321 - val_acc: 0.7680
Epoch 18/100
1/1 [==============================] - 0s 186ms/step - loss: 0.0610 - acc: 0.8929 - val_loss: 0.2276 - val_acc: 0.7720
Epoch 19/100
1/1 [==============================] - 0s 221ms/step - loss: 0.0602 - acc: 0.9071 - val_loss: 0.2239 - val_acc: 0.7660
Epoch 20/100
1/1 [==============================] - 0s 184ms/step - loss: 0.0567 - acc: 0.9286 - val_loss: 0.2202 - val_acc: 0.7640
Epoch 21/100
1/1 [==============================] - 0s 191ms/step - loss: 0.0607 - acc: 0.9214 - val_loss: 0.2170 - val_acc: 0.7640
Epoch 22/100
1/1 [==============================] - 0s 191ms/step - loss: 0.0519 - acc: 0.9286 - val_loss: 0.2139 - val_acc: 0.7680
Epoch 23/100
1/1 [==============================] - 0s 201ms/step - loss: 0.0554 - acc: 0.9214 - val_loss: 0.2106 - val_acc: 0.7700
Epoch 24/100
1/1 [==============================] - 0s 207ms/step - loss: 0.0534 - acc: 0.8929 - val_loss: 0.2069 - val_acc: 0.7740
Epoch 25/100
1/1 [==============================] - 0s 192ms/step - loss: 0.0490 - acc: 0.9500 - val_loss: 0.2031 - val_acc: 0.7780
Epoch 26/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0528 - acc: 0.9071 - val_loss: 0.1995 - val_acc: 0.7760
Epoch 27/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0503 - acc: 0.9143 - val_loss: 0.1965 - val_acc: 0.7860
Epoch 28/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0500 - acc: 0.9571 - val_loss: 0.1939 - val_acc: 0.7840
Epoch 29/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0490 - acc: 0.9143 - val_loss: 0.1913 - val_acc: 0.7840
Epoch 30/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0479 - acc: 0.9143 - val_loss: 0.1895 - val_acc: 0.7820
Epoch 31/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0472 - acc: 0.9143 - val_loss: 0.1881 - val_acc: 0.7800
Epoch 32/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0459 - acc: 0.9429 - val_loss: 0.1868 - val_acc: 0.7800
Epoch 33/100
1/1 [==============================] - 0s 189ms/step - loss: 0.0485 - acc: 0.8714 - val_loss: 0.1845 - val_acc: 0.7800
Epoch 34/100
1/1 [==============================] - 0s 190ms/step - loss: 0.0457 - acc: 0.9429 - val_loss: 0.1817 - val_acc: 0.7820
Epoch 35/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0450 - acc: 0.9500 - val_loss: 0.1792 - val_acc: 0.7880
Epoch 36/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0439 - acc: 0.9357 - val_loss: 0.1773 - val_acc: 0.7880
Epoch 37/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0457 - acc: 0.9429 - val_loss: 0.1762 - val_acc: 0.7800
Epoch 38/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0419 - acc: 0.9500 - val_loss: 0.1759 - val_acc: 0.7740
Epoch 39/100
1/1 [==============================] - 0s 189ms/step - loss: 0.0445 - acc: 0.9143 - val_loss: 0.1755 - val_acc: 0.7760
Epoch 40/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0435 - acc: 0.9286 - val_loss: 0.1752 - val_acc: 0.7800
Epoch 41/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0424 - acc: 0.9357 - val_loss: 0.1753 - val_acc: 0.7800
Epoch 42/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0386 - acc: 0.9714 - val_loss: 0.1755 - val_acc: 0.7760
Epoch 43/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0402 - acc: 0.9286 - val_loss: 0.1751 - val_acc: 0.7740
Epoch 44/100
1/1 [==============================] - 0s 200ms/step - loss: 0.0392 - acc: 0.9214 - val_loss: 0.1743 - val_acc: 0.7780
Epoch 45/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0390 - acc: 0.9500 - val_loss: 0.1728 - val_acc: 0.7740
Epoch 46/100
1/1 [==============================] - 0s 190ms/step - loss: 0.0395 - acc: 0.9643 - val_loss: 0.1712 - val_acc: 0.7840
Epoch 47/100
1/1 [==============================] - 0s 185ms/step - loss: 0.0401 - acc: 0.9214 - val_loss: 0.1692 - val_acc: 0.7800
Epoch 48/100
1/1 [==============================] - 0s 197ms/step - loss: 0.0385 - acc: 0.9571 - val_loss: 0.1677 - val_acc: 0.7840
Epoch 49/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0404 - acc: 0.9357 - val_loss: 0.1660 - val_acc: 0.7800
Epoch 50/100
1/1 [==============================] - 0s 187ms/step - loss: 0.0381 - acc: 0.9286 - val_loss: 0.1648 - val_acc: 0.7800
Epoch 51/100
1/1 [==============================] - 0s 203ms/step - loss: 0.0389 - acc: 0.9643 - val_loss: 0.1648 - val_acc: 0.7820
Epoch 52/100
1/1 [==============================] - 0s 183ms/step - loss: 0.0381 - acc: 0.9500 - val_loss: 0.1646 - val_acc: 0.7780
Epoch 53/100
1/1 [==============================] - 0s 195ms/step - loss: 0.0372 - acc: 0.9214 - val_loss: 0.1644 - val_acc: 0.7780
Epoch 54/100
1/1 [==============================] - 0s 191ms/step - loss: 0.0371 - acc: 0.9643 - val_loss: 0.1643 - val_acc: 0.7800
Epoch 55/100
1/1 [==============================] - 0s 195ms/step - loss: 0.0357 - acc: 0.9357 - val_loss: 0.1647 - val_acc: 0.7780
Epoch 56/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0353 - acc: 0.9643 - val_loss: 0.1653 - val_acc: 0.7760
Epoch 57/100
1/1 [==============================] - 0s 221ms/step - loss: 0.0345 - acc: 0.9571 - val_loss: 0.1658 - val_acc: 0.7740
Epoch 58/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0374 - acc: 0.9214 - val_loss: 0.1654 - val_acc: 0.7740
Epoch 59/100
1/1 [==============================] - 0s 194ms/step - loss: 0.0347 - acc: 0.9857 - val_loss: 0.1642 - val_acc: 0.7780
Epoch 60/100
1/1 [==============================] - 0s 195ms/step - loss: 0.0378 - acc: 0.9571 - val_loss: 0.1620 - val_acc: 0.7780
Epoch 61/100
1/1 [==============================] - 0s 183ms/step - loss: 0.0363 - acc: 0.9643 - val_loss: 0.1598 - val_acc: 0.7840
Epoch 62/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0338 - acc: 0.9571 - val_loss: 0.1574 - val_acc: 0.7860
Epoch 63/100
1/1 [==============================] - 0s 193ms/step - loss: 0.0368 - acc: 0.9571 - val_loss: 0.1556 - val_acc: 0.7900
Epoch 64/100
1/1 [==============================] - 0s 194ms/step - loss: 0.0339 - acc: 0.9571 - val_loss: 0.1548 - val_acc: 0.7960
Epoch 65/100
1/1 [==============================] - 0s 194ms/step - loss: 0.0359 - acc: 0.9500 - val_loss: 0.1552 - val_acc: 0.7880
Epoch 66/100
1/1 [==============================] - 0s 193ms/step - loss: 0.0364 - acc: 0.9643 - val_loss: 0.1567 - val_acc: 0.7780
Epoch 67/100
1/1 [==============================] - 0s 192ms/step - loss: 0.0340 - acc: 0.9286 - val_loss: 0.1581 - val_acc: 0.7780
Epoch 68/100
1/1 [==============================] - 0s 194ms/step - loss: 0.0346 - acc: 0.9786 - val_loss: 0.1599 - val_acc: 0.7760
Epoch 69/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0349 - acc: 0.9643 - val_loss: 0.1614 - val_acc: 0.7700
Epoch 70/100
1/1 [==============================] - 0s 193ms/step - loss: 0.0325 - acc: 0.9714 - val_loss: 0.1627 - val_acc: 0.7700
Epoch 71/100
1/1 [==============================] - 0s 193ms/step - loss: 0.0332 - acc: 0.9643 - val_loss: 0.1622 - val_acc: 0.7720
Epoch 72/100
1/1 [==============================] - 0s 184ms/step - loss: 0.0358 - acc: 0.9429 - val_loss: 0.1614 - val_acc: 0.7740
Epoch 73/100
1/1 [==============================] - 0s 192ms/step - loss: 0.0327 - acc: 0.9500 - val_loss: 0.1597 - val_acc: 0.7800
Epoch 74/100
1/1 [==============================] - 0s 195ms/step - loss: 0.0306 - acc: 0.9714 - val_loss: 0.1578 - val_acc: 0.7740
Epoch 75/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0306 - acc: 0.9786 - val_loss: 0.1565 - val_acc: 0.7800
Epoch 76/100
1/1 [==============================] - 0s 174ms/step - loss: 0.0323 - acc: 0.9571 - val_loss: 0.1549 - val_acc: 0.7840
Epoch 77/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0296 - acc: 0.9786 - val_loss: 0.1530 - val_acc: 0.7860
Epoch 78/100
1/1 [==============================] - 0s 182ms/step - loss: 0.0318 - acc: 0.9857 - val_loss: 0.1511 - val_acc: 0.7880
Epoch 79/100
1/1 [==============================] - 0s 183ms/step - loss: 0.0319 - acc: 0.9571 - val_loss: 0.1503 - val_acc: 0.7880
Epoch 80/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0317 - acc: 0.9786 - val_loss: 0.1498 - val_acc: 0.7840
Epoch 81/100
1/1 [==============================] - 0s 197ms/step - loss: 0.0308 - acc: 0.9571 - val_loss: 0.1491 - val_acc: 0.7860
Epoch 82/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0321 - acc: 0.9643 - val_loss: 0.1489 - val_acc: 0.7860
Epoch 83/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0313 - acc: 0.9643 - val_loss: 0.1491 - val_acc: 0.7840
Epoch 84/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0304 - acc: 0.9643 - val_loss: 0.1518 - val_acc: 0.7820
Epoch 85/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0323 - acc: 0.9429 - val_loss: 0.1554 - val_acc: 0.7760
Epoch 86/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0288 - acc: 0.9571 - val_loss: 0.1583 - val_acc: 0.7720
Epoch 87/100
1/1 [==============================] - 0s 183ms/step - loss: 0.0311 - acc: 0.9714 - val_loss: 0.1587 - val_acc: 0.7720
Epoch 88/100
1/1 [==============================] - 0s 184ms/step - loss: 0.0315 - acc: 0.9786 - val_loss: 0.1575 - val_acc: 0.7720
Epoch 89/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0302 - acc: 0.9714 - val_loss: 0.1551 - val_acc: 0.7740
Epoch 90/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0313 - acc: 0.9500 - val_loss: 0.1524 - val_acc: 0.7780
Epoch 91/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0294 - acc: 0.9786 - val_loss: 0.1500 - val_acc: 0.7800
Epoch 92/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0298 - acc: 0.9714 - val_loss: 0.1495 - val_acc: 0.7800
Epoch 93/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0303 - acc: 0.9786 - val_loss: 0.1485 - val_acc: 0.7780
Epoch 94/100
1/1 [==============================] - 0s 177ms/step - loss: 0.0304 - acc: 0.9786 - val_loss: 0.1478 - val_acc: 0.7820
Epoch 95/100
1/1 [==============================] - 0s 183ms/step - loss: 0.0290 - acc: 0.9857 - val_loss: 0.1472 - val_acc: 0.7860
Epoch 96/100
1/1 [==============================] - 0s 178ms/step - loss: 0.0311 - acc: 0.9286 - val_loss: 0.1482 - val_acc: 0.7880
Epoch 97/100
1/1 [==============================] - 0s 179ms/step - loss: 0.0305 - acc: 0.9571 - val_loss: 0.1503 - val_acc: 0.7820
Epoch 98/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0290 - acc: 0.9571 - val_loss: 0.1523 - val_acc: 0.7860
Epoch 99/100
1/1 [==============================] - 0s 180ms/step - loss: 0.0276 - acc: 0.9929 - val_loss: 0.1541 - val_acc: 0.7840
Epoch 100/100
1/1 [==============================] - 0s 181ms/step - loss: 0.0287 - acc: 0.9786 - val_loss: 0.1562 - val_acc: 0.7820
Out[19]:
<tensorflow.python.keras.callbacks.History at 0x182c6f57708>

5.7 Model Evaluation

In [20]:
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import numpy as np

def plot_confusion_matrix(y_true, y_pred, classes,
                          size=(10, 10),
                          fontsize=20,
                          cmap=plt.cm.Blues):

    plt.rc('font', size = fontsize)    

    cm = confusion_matrix(y_true, y_pred)

    fig, ax = plt.subplots(figsize=size)
    im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
    ax.figure.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
    ax.set(xticks=np.arange(cm.shape[1]),
           yticks=np.arange(cm.shape[0]),
           xticklabels=classes, yticklabels=classes,
           title='Confusion matrix for test dataset',
           ylabel='True label',
           xlabel='Predicted label')

    plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")

    fmt = 'd'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            ax.text(j, i, format(cm[i, j], fmt),
                    ha="center", va="center",
                    color="white" if cm[i, j] > thresh else "black", fontsize=20)
    fig.tight_layout()
    
    plt.show()
In [21]:
X_te = X[test_mask]
A_te = DAD[test_mask,:][:,test_mask]
y_te = labels_encoded[test_mask]

y_pred = model.predict([X_te, A_te], batch_size=N)
plot_confusion_matrix(np.argmax(y_te, axis=1), np.argmax(y_pred, axis=1), classes, fontsize=15)

5.8 T-SNE

In [22]:
layer_outputs = [layer.output for layer in model.layers]
activation_model = Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict([X,DAD],batch_size=N)

x_tsne = TSNE(n_components=2).fit_transform(activations[3]) 
c:\users\user\appdata\local\programs\python\python37\lib\site-packages\sklearn\manifold\t_sne.py:347: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  error = np.finfo(np.float).max
c:\users\user\appdata\local\programs\python\python37\lib\site-packages\sklearn\manifold\t_sne.py:348: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  best_error = np.finfo(np.float).max
c:\users\user\appdata\local\programs\python\python37\lib\site-packages\sklearn\manifold\t_sne.py:347: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  error = np.finfo(np.float).max
c:\users\user\appdata\local\programs\python\python37\lib\site-packages\sklearn\manifold\t_sne.py:348: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  best_error = np.finfo(np.float).max
In [23]:
def plot_tSNE(labels_encoded,x_tsne):
    color_map = np.argmax(labels_encoded, axis=1)
    plt.figure(figsize=(10,10))
    for cl in range(num_classes):
        indices = np.where(color_map==cl)
        indices = indices[0]
        plt.scatter(x_tsne[indices,0], x_tsne[indices, 1], label=cl)
    plt.legend()
    plt.show()
    
plot_tSNE(labels_encoded,x_tsne)

6. Useful Resources for Further Study

In [1]:
%%html 
<center><iframe src="https://www.youtube.com/embed/fOctJB4kVlM?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [2]:
%%html 
<center><iframe src="https://www.youtube.com/embed/ABCGCf8cJOE?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [3]:
%%html 
<center><iframe src="https://www.youtube.com/embed/0YLZXjMHA-8?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [4]:
%%html 
<center><iframe src="https://www.youtube.com/embed/ex2qllcVneY?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [5]:
%%html 
<center><iframe src="https://www.youtube.com/embed/YL1jGgcY78U?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [6]:
%%html 
<center><iframe src="https://www.youtube.com/embed/8owQBFAHw7E?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [7]:
%%html 
<center><iframe src="https://www.youtube.com/embed/R67-JxtOQzg?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

Libraries for TensorFlow: GraphNets, Spektral, DGL (or https://vermamachinelearning.github.io/keras-deep-graph-learning/)

In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')